Relabeling Distantly Supervised Training Data for Temporal Knowledge Base Population

نویسندگان

  • Suzanne Tamang
  • Heng Ji
چکیده

We enhance a temporal knowledge base population system to improve the quality of distantly supervised training data and identify a minimal feature set for classification. The approach uses multi-class logistic regression to eliminate individual features based on the strength of their association with a temporal label followed by semi-supervised relabeling using a subset of human annotations and lasso regression. As implemented in this work, our technique improves performance and results in notably less computational cost than a parallel system trained on the full feature set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A distant supervised learning system for the TAC-KBP Slot Filling and Temporal Slot Filling Tasks

This paper describes the system implemented by the NLP GROUP AT UNED for our first participation in the Knowledge Base Population at the Text Analysis Conference (TACKBP). For this Slot Filling Task, our approach was to design a distant supervised learning system, which was then specialized for the Regular Slot Filling and Full Temporal Slot Filling subtasks. From the initial Knowledge Base and...

متن کامل

Minimally Supervised Event Argument Extraction using Universal Schema

The prediction of events and their participants is an important component of building a knowledge base automatically from text. Typically, the events of interest are domain-specific and not known in advance, and so it is often the case that little or no training data is available to learn the appropriate predictors. In this work, we propose a technique for distantly supervised event argument ex...

متن کامل

Stanford's Distantly-Supervised Slot-Filling System

This paper describes the design and implementation of the slot filling system prepared by Stanford’s natural language processing group for the 2011 Knowledge Base Population (KBP) track at the Text Analysis Conference (TAC). Our system relies on a simple distant supervision approach using mainly resources furnished by the track’s organizers: we used slot examples from the provided knowledge bas...

متن کامل

Applying UMLS for Distantly Supervised Relation Detection

This paper describes first results using the Unified Medical Language System (UMLS) for distantly supervised relation extraction. UMLS is a large knowledge base which contains information about millions of medical concepts and relations between them. Our approach is evaluated using existing relation extraction data sets that contain relations that are similar to some of those in UMLS.

متن کامل

Distantly Labeling Data for Large Scale Cross-Document Coreference

Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on “distantly-labeling” a data set from which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012